-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CDAP-15361] Add Schema handling in Wrangler #645
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refer to JIRA int the PR title
wrangler-api/src/main/java/io/cdap/wrangler/api/schema/SchemaFieldGenerator.java
Outdated
Show resolved
Hide resolved
wrangler-api/src/main/java/io/cdap/wrangler/api/schema/SchemaFieldGenerator.java
Outdated
Show resolved
Hide resolved
wrangler-core/src/main/java/io/cdap/wrangler/executor/RecipePipelineExecutor.java
Outdated
Show resolved
Hide resolved
wrangler-core/src/main/java/io/cdap/wrangler/executor/RecipePipelineExecutor.java
Outdated
Show resolved
Hide resolved
wrangler-core/src/main/java/io/cdap/directives/column/Rename.java
Outdated
Show resolved
Hide resolved
wrangler-api/src/main/java/io/cdap/wrangler/api/schema/SchemaUpdater.java
Outdated
Show resolved
Hide resolved
wrangler-api/src/main/java/io/cdap/wrangler/api/schema/SchemaUpdater.java
Outdated
Show resolved
Hide resolved
wrangler-core/src/main/java/io/cdap/wrangler/executor/RecipePipelineExecutor.java
Outdated
Show resolved
Hide resolved
wrangler-core/src/main/java/io/cdap/wrangler/executor/RecipePipelineExecutor.java
Outdated
Show resolved
Hide resolved
wrangler-core/src/main/java/io/cdap/wrangler/executor/RecipePipelineExecutor.java
Outdated
Show resolved
Hide resolved
wrangler-core/src/main/java/io/cdap/wrangler/schema/Constants.java
Outdated
Show resolved
Hide resolved
wrangler-core/src/main/java/io/cdap/wrangler/schema/Constants.java
Outdated
Show resolved
Hide resolved
wrangler-core/src/main/java/io/cdap/wrangler/executor/RecipePipelineExecutor.java
Outdated
Show resolved
Hide resolved
wrangler-core/src/main/java/io/cdap/wrangler/executor/RecipePipelineExecutor.java
Outdated
Show resolved
Hide resolved
wrangler-core/src/main/java/io/cdap/wrangler/executor/RecipePipelineExecutor.java
Outdated
Show resolved
Hide resolved
wrangler-service/src/main/java/io/cdap/wrangler/service/directive/AbstractDirectiveHandler.java
Outdated
Show resolved
Hide resolved
wrangler-service/src/main/java/io/cdap/wrangler/service/directive/AbstractDirectiveHandler.java
Outdated
Show resolved
Hide resolved
wrangler-core/src/main/java/io/cdap/wrangler/executor/RecipePipelineExecutor.java
Outdated
Show resolved
Hide resolved
wrangler-core/src/main/java/io/cdap/wrangler/executor/RecipePipelineExecutor.java
Outdated
Show resolved
Hide resolved
if (designTime && schema != null) { | ||
Schema directiveOutputSchema = directive.getOutputSchema(schema); | ||
schema = directiveOutputSchema != null ? directiveOutputSchema | ||
: generateOutputSchema(schema, cumulativeRows); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should introduce a feature flag for this with default enabled.. If schema generator fails on some customer schema, there should be a way to disable it
wrangler-core/src/main/java/io/cdap/wrangler/executor/RecipePipelineExecutor.java
Outdated
Show resolved
Hide resolved
wrangler-core/src/main/java/io/cdap/wrangler/executor/RecipePipelineExecutor.java
Outdated
Show resolved
Hide resolved
wrangler-core/src/main/java/io/cdap/wrangler/executor/RecipePipelineExecutor.java
Outdated
Show resolved
Hide resolved
wrangler-core/src/main/java/io/cdap/wrangler/utils/OutputSchemaGenerator.java
Outdated
Show resolved
Hide resolved
wrangler-core/src/main/java/io/cdap/wrangler/utils/DirectiveOutputSchemaGenerator.java
Outdated
Show resolved
Hide resolved
wrangler-core/src/main/java/io/cdap/wrangler/utils/DirectiveOutputSchemaGenerator.java
Outdated
Show resolved
Hide resolved
wrangler-core/src/main/java/io/cdap/wrangler/utils/DirectiveOutputSchemaGenerator.java
Outdated
Show resolved
Hide resolved
wrangler-core/src/main/java/io/cdap/wrangler/utils/DirectiveOutputSchemaGenerator.java
Outdated
Show resolved
Hide resolved
Map<String, String> types = new LinkedHashMap<>(); | ||
|
||
Schema outputSchema = TRANSIENT_STORE.get(TransientStoreKeys.OUTPUT_SCHEMA) != null ? | ||
TRANSIENT_STORE.get(TransientStoreKeys.OUTPUT_SCHEMA) : TRANSIENT_STORE.get(TransientStoreKeys.INPUT_SCHEMA); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which cases OUTPUT_SCHEMA will be null? This "else" part seems problematic for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially only input schema is set in transient store (when workspace created for ex)
Background
Currently, Wrangler infers the CDAP schema of the data where necessary during design-time (UI display, creating plugin/applying directives). This leads to inference errors and information loss. Hence, a mechanism for directives to correctly update schema after transformation is implemented and input schema information is not lost.
Changes
getOutputSchema
method to Executor interface (Directives can use this to return custom schema after transform)TransientStore
and accessed when necessary.specification
and AbstractDirectiveHandler'sgenerateExecutionResponse
method now use the output schema instead of inferringgetOutputSchema
, the schema is generated as a fallback mechanismNOTE: the schema changes are only applicable in design time (when Wrangler runs as a service, not in the plugin)